Global Trade-o between Code Size and Performance for Loop Unrolling on VLIW Architectures
نویسندگان
چکیده
Many media processors 28, 7, 14, 8, 18, 27], used for computing intensive embedded applications, are VLIW architectures that rely on the compiler to exploit Instruction Level Parallelism. Loop unrolling is generally used to expose instruction parallelism but computing the unrolling factor is very diicult as instruction cache misses and spill code can cancel the expected beneet of the transformation. Moreover increasing the code size directly impacts on the embedded system cost. In this paper, we propose a method, called UFC (Unrolling Factor computation under Constraints) to compute unrolling factors of set of loops while taking into account code size, a major issue for embedded systems.
منابع مشابه
UFC : a Global Trade - o Strategy for Loop Unrolling for VLIWArchitectureK
In order to minimize code size overhead on VLIW ar-chitectures, compilers for embedded processors have to pay higher attention on code optimization than on compilation time. Thus, the rst demand on compiler for embedded processors consists in spending instruction memory space for optimization only if the associated performance improvement justiies it. In this paper, we propose a novel method ba...
متن کاملA Study of Loop Unrolling for VLIW-based DSP Processors
With the growing popularity of DSPs and their associated applications, cost-effective software development has become a major issue. High-level language compilers are becoming more commonplace in the DSP world. While these compilers can generate correct code for DSP architectures, there remains considerable room for performance improvements. This paper addresses issues related to DSP compilatio...
متن کاملThe Effectiveness of Loop Unrolling for Modulo Scheduling in Clustered VLIW Architectures
Clustered organizations are becoming a common trend in the design of VLIW architectures. In this work we propose a novel modulo scheduling approach for such architectures. The proposed technique performs the cluster assignment and the instruction scheduling in a single pass, which is shown to be more effective than doing first the assignment and later the scheduling. We also show that loop unro...
متن کاملCode Size Aware Compilation for Real-Time Applications
Statically constructed plan of execution (POE) and aggressive instruction level parallelism (ILP) exploitation make EPIC/VLIW processors appropriate for high performance real-time systems. On the one hand, the compiler controlled POE makes the worst-case execution-time (WCET) analysis more accurate as run-time variations are minimized. On the other hand, the compiler can leverage ILP optimizati...
متن کاملSelf-Evaluating Compilation Applied to Loop Unrolling
Well-engineered compilers use a carefully selected set of optimizations, heuristic optimization policies, and a phase ordering to produce good machine code. Designing a compiler with one heuristic per optimization that works well with other optimization phases is a challenging task. Although compiler designers evaluate the optimization heuristics and phase ordering before deployment, compilers ...
متن کامل